Computing with Residue Numbers in High-Dimensional Representation

We introduce Residue Hyperdimensional Computing, a computing framework that unifies residue number systems with an algebra defined over random, high-dimensional vectors. We show how residue numbers can be represented as high-dimensional vectors in a manner that allows algebraic operations to be performed with component-wise, parallelizable operations on the vector elements. The resulting framework, when combined with an efficient method for factorizing high-dimensional vectors, can represent and operate on numerical values over a large dynamic range using vastly fewer resources than previous methods, and it exhibits impressive robustness to noise. We demonstrate the potential for this framework to solve computationally difficult problems in visual perception and combinatorial optimization, showing improvement over baseline methods. More broadly, the framework provides a possible account for the computational operations of grid cells in the brain, and it suggests new machine learning architectures for representing and manipulating numerical data.


Introduction
The problem of representing and computing on high-dimensional representations of numerical values-such as position, velocity, and color-is central to both machine learning and computational neuroscience.In machine learning, vector representations of numbers are useful for defining position or function encodings in neural networks [1]- [3], improving robustness to adversarial examples [4], and generating efficient classifiers [5].In neuroscience, experimentalists seek to understand how populations of neurons in the brain represent and transform perceptual or cognitive variables, and so numerous theorists have constructed models for how these variables could be encoded in and decoded from high-dimensional vector encodings (e.g., [6]- [8]).
A particularly salient example of high-dimensional representation in neuroscience is the 'grid cell' encoding of spatial position in the medial entorhinal cortex [9].Grid cells have multiple peaks in their firing rates that correlate with spatial positions arranged in a hexagonal lattice.While somewhat perplexing at first glance, the usefulness of such a coding scheme becomes apparent from how it functions as a population code.In comparison to a population of neurons with traditional unimodal encoding functions whose coding resolution increases linearly with the number of neurons, a grid cell population possesses a coding resolution that grows exponentially in the number of neurons [10].In particular, Fiete and colleagues have emphasized that this advantage of grid cell encoding utilizes properties of residue numbers (see Section 2.2 below) [11].
Inspired by this observation, we propose a comprehensive algebraic framework for distributed neural computation based on residue number systems.Our novel framework builds on an existing algebraic framework for computing with high-dimensional random vectors.The idea was originally called computing in Holographic Reduced Representation [12] and is now referred to also as Vector Symbolic Architectures (VSA) [13] and Hyperdimensional Computing [14].We call our new framework Residue Hyperdimensional Computing (RHC) and demonstrate that it inherits the computational advantages of both standard residue number systems and hyperdimensional computing.This enables fault-tolerant computations that can efficiently represent numbers and search over a large dynamic range with greatly reduced memory requirements.Furthermore, as we shall see, the new framework provides a useful formalism for understanding computations in grid cells.
To summarize, we list the four key coding properties we achieve with Residue Hyperdimensional Computing: (1) Algebraic structure: Simple operations on vectors perform addition and multiplication on encoded values, (2) Expressivity: Feasible encoding range scales better than linearly with dimension, (3) Efficient decoding: Required resources to decode scale better than linearly with encoding range, and (4) Robustness to noise.Although a number of previously proposed models achieve some of these properties (Table S.1, Supplement A), Residue Hyperdimensional Computing is the first, to our knowledge, to achieve all four of these desiderata, as we shall now show.

Results
We first define the key concepts on which the Residue Hyperdimensional Computing framework is based (Section 2.1) and then describe the framework fully in Section 2.2.We then demonstrate its favorable encoding, decoding, and robustness properties (Section 2.3), as well as how it can be extended to multiple dimensions (Section 2.4) and sub-integer encodings (Section 2.5).Of particular note, we construct a hexagonal residue encoding system, analogous to grid cell coordinates, that provides higher spatial resolution than square lattices.Finally, we describe how the framework can be applied to problems in visual scene analysis and combinatorial optimization (Section 2.6).

Preliminary definitions
Definition 2.1.1.A Residue Number System [15] encodes an integer x ∈ Z by its value modulo {m 1 , m 2 , . . ., m K }, where the m k are the moduli of the system.For example, relative to moduli {3, 5, 7}, x = 20 would be encoded by the residue [2, 0, 6]-i.e., [20 mod 3, 20 mod 5, 20 mod 7].The Chinese Remainder Theorem states that if the moduli are pairwise co-prime, then for any x such that 0 ≤ x < M := k m k , the integer is uniquely encoded by its residue [16].From here on, we will assume that the pairwise coprime condition is fulfilled.Definition 2.1.2.Fractional Power Encoding (FPE) [12] defines a randomized mapping from an integer x to a high-dimensional vector z(x).Let D be the dimension of the vector, with D typically in the range 10 2 ≤ D ≤ 10 4 .The encoding scheme involves two steps.First, draw a random base vector, z, defined as follows: where each element e iϕ j is a complex number with unit magnitude (a phasor ), and each ϕ j is a random sample from a specified probability distribution.Second, define a function from x to C D via component-wise exponentiation of the random vector: , is a function χ × χ → R that measures the similarity between two objects in a set χ (e.g., vectors in R n ).Notably, FPE implements kernel approximation [17], which is widely used in machine learning [18].More specifically, we can induce a kernel based on the inner products of FPEs: where z(x 2 ) is the complex conjugate of z(x 2 ).This defines a translation-invariant kernel K(∆x) (where ∆x = x 1 − x 2 ), which converges to a particular K * (∆x) as D → ∞, where the shape of the kernel is determined by the probability distribution used to draw z [19], [20].

Residue Hyperdimensional Computing
We now introduce how FPE can implement a residue number system.As a first step, we explain how FPE can implement congruence (representing a remainder, modulo m).
Definition 2.2.1.Fractional Power Encoding, modulo m: For z m , let the support of our probability distribution for the ϕ j be the m-th roots of unity.In other words, each ϕ j must be a multiple of 2π/m.Then congruent values are mapped to the same vector: because e iϕ j m = e 2π•i•k for some integer k, and e 2π•i•k = 1 for any integer k.Put another way, z m (x) is a representation (in the abstract algebraic sense) of the additive group of integers modulo m.
The kernel induced by z m (∆x) is 1 if ∆x = 0 (mod m), and ≈ 0 otherwise, as shown in Figure 1a.This is highly useful, because it implies that distinct integers behave as quasi-orthogonal vectors, just like symbols in hyperdimensional computing.Unlike symbols, however, we can perform algebraic manipulations transforming one integer into another.Definition 2.2.2.Residue Hyperdimensional Computing: Let z m 1 , z m 2 , . . ., z m K denote FPE vectors with moduli m 1 , m 2 , . . ., m K respectively.Let ⊙ denote component-wise multiplication (a.k.a.Hadamard product).Then, we encode an integer x by combining our modulo representations via the Hadamard product: The above encoding represents the remainder of x, because each z m k represents its value modulo m k .The code is fully distributed, as every element of the vector z contains information about each encoding vector z m k (x).By contrast, typical implementations of residue number systems compartmentalize information about individual moduli [21].
The kernel induced by z(∆x) is 1 if ∆x = 0 (mod k m k = M ), and ≈ 0 for other integer intervals ∆x (Figure 1b).This means that the kernel maps different remainders of our residue number system to quasi-orthogonal directions in high-dimensional vector space, and enables computing in superposition over these variables.Examples of possible applications enabled by such a scheme are presented in Section 2.6.
The hallmark of a residue number system is carry-free arithmetic; that is, addition, subtraction, and multiplication can be performed component-wise on the remainders.This enables residue number systems to be highly parallel, avoiding carryover operations required in binary number systems.Residue Hyperdimensional Computing implements arithmetic with component-wise operations, thus inheriting the premier computational property of residue number systems.
Addition is defined as the Hadamard product between vectors, that is z( 1).This follows from the fact that component-wise multiplication of phasors corresponds to phase addition, and that component-wise multiplication is commutative.Subtraction is defined by addition of the additive inverse.
Next, we define a second binding operation that implements multiplication, denoted as Just as variable addition is implemented by element-wise multiplication, variable multiplication is implemented by another element-wise operation, this one involving exponentiation (Methods 4.1.2).Here, it is crucial that our encoding function is restricted to integers as its domain, because multiplication is commutative and integer powers commute (that is, (c x 1 ) x 2 = c x 1 x 2 = (c x 2 ) x 1 for c ∈ C and integers x 1 , x 2 ).We show how this definition for integer multiplication can be generalized to multiplication for vector encodings z(x 1 ) and z(x 2 ), without invoking the costly step of first decoding the integers back from the vectors.
Division is not well-defined for residue number systems, because integers are not closed under division.Still, when a modulus m k is prime, multiplications by non-zero integers are invertible, because each non-zero integer has a unique multiplicative inverse with respect to m k .
The existence of two distinct binding operators for addition and multiplication is a new contribution to hyperdimensional computing.Previous formulations only supported addition, or only multiplication via addition after taking logarithmic transformations [22].Consequently, we can now formulate a fully distributed residue number system that inherits the benefits of computing with high-dimensional vectors.

The resonator network enables efficient decoding of residue numbers
Given the vector representation of a residue number, z(x), how do we decode it to recover its value, x?One method for decoding commonly used in hyperdimensional computing is codebook decoding [23], which involves taking the inner product of z(x) with a set of M codebook vectors with known values of x.However, this procedure requires O(M * D) storage and M inner product evaluations.Fortunately, we can improve the situation by utilizing the fact that residue numbers break the dynamic range of a number into a set of smaller numbers, each with lower overall dynamic range.For example, a number with dynamic range of 105, when represented modulo [3,5,7], consumes a total dynamic range of 3+5+7 = 15.This in turn reduces both the memory and computation resources for decoding by a factor of 105/15 = 7.To make this work though requires that we invert Equation 4-that is, we must factorize z(x) into the set of constituent vectors {z m 1 (x), z m 2 (x), . . ., z m K (x)} representing x modulo m k , from which x can be easily recovered.For this we can use a resonator network [24], [25], a recently discovered method for efficiently factorizing vectors in hyperdimensional computing.Figure 2a shows that for a range of M values, the resonator network can recover vectors over an order of magnitude faster than standard codebook decoding.Two parameters that contribute to this are the vector dimension (D) and number of moduli (K).
To evaluate the dependence of resonator decoding on vector dimension, we fix the number of moduli (K = 2) and calculate the empirical accuracy (Figure 2b) of the resonator on the effective range M .We find that for a fixed D, the accuracy remains almost perfect up to a certain range of M , after which accuracy rapidly decays.To evaluate scaling with D, we define the capacity C of a dimension to be the largest M up to which empirical accuracy is at least 95 percent.We find that the scaling of C(D) is well-fit with a quadratic polynomial (Figure 2c), consistent with previous scaling laws studied for a resonator network with two states per component [25].Further tests with higher dimensions would help confirm quadratic scaling, but even the linear scaling has high slope (and note that C(4096) > .d, Resonator network performance is slightly worse for a higher number of moduli, K, but e, an advantage of lower moduli is a much higher effective range given encoding resources.f, The capacity of the resonator network remains high even in the presence of large amounts of phase noise.

× 10 6 ).
To evaluate dependence on the number of moduli, we fix D = 1024 and vary K.We find that resonator capacity decreases as K increases (Figure 2d), also consistent with prior work [25].Still, we emphasize that resonators with higher K have two advantages: decreased computation per decoding (Figure 2a), and decreased memory requirements.The resonator requires only k m k = b codebook vectors, rather than k m k = M .This means that increasing K can increase the effective range by several orders of magnitude given a fixed codebook budget (Figure 2e).Remarkably, the maximal M for a given b is given by Landau's function g(b), which scales as g(b) = e (1+o(1)) √ b ln b [26].This implies an exponential scaling between storage required and effective range, if we proffer sufficient K to achieve it.
Finally, we evaluate how robust resonator network decoding is to noise.We draw phase noise from a von Mises distribution with mean 0 and concentration κ; higher κ indicates less noise.In Figure 2f, we observe that performance degrades gradually as a function of noise, yet capacity remains remarkably high even at high noise levels.

Generalization to multiple dimensions 2.4.1 Cartesian representations of Z n
Next, we generalize Residue Hyperdimensional Computing from scalars to multi-dimensional variables, showing that the core operations and principles still apply.Let x ∈ Z n be a lowdimensional vector.Let x 1 , x 2 , . . ., x n denote the components of x.To encode x with a single high-dimensional vector z ∈ V D (D >> n) we form the encoding z(x) by taking the Hadamard product of the distributed representations of individual components: where each z i is a random vector as generated for the residue representation of a single number in Equation ( 4).(Each z i is created from binding multiple vectors for each of the Figure 3: Definition of a residue number system in non-negative hexagonal coordinates.a and c show discrete phase distributions chosen for a 3D coordinate system of a 2D space with moduli 3 and 5, respectively.In addition to requiring that phases are drawn from the m-th roots of unity, we enforce that the three phases sum to 0 (mod 2π).b and d show the respective kernels generated by these phase distributions.e, An example of the Voronoi tesselation of different states composed of a hexagonal coordinate system with modulus 5.Each color corresponds to a different state representation in the vector space of integers.f, The kernel induced by a hexagonal residue HD vector (period of 3 • 5 = 15).g, Compared to square encodings of space, hexagonal encodings approximately triple the effective range of encodable states with only a 50 percent increase in required storage space.h, The Shannon entropy of the hexagonal code is higher than that of the square code with the same modulus, reflecting the benefit of hexagonal packing.
moduli, distinct for each dimension i.) Since the binding operation is commutative, we can rearrange our terms to show the following useful properties hold: (a) The Hadamard product (⊙) performs vector addition: (b) The multiplicative binding operation (⋆) performs component-wise multiplication of x and (c) The kernel induced by z is the product of the kernels of the individual components: , where ∆x = x − x ′ .While (a) and (c) are general properties of FPE, (b) is once again unique to Residue Hyperdimensional Computing.In addition, the savings in decoding resources and computation required (Figure 2) also scale in higher dimensions, as utilized in Section 2.6.

Hexagonal coordinate systems
When working in a multi-dimensional space, there are multiple alternatives to a Cartesian coordinate system.For example, grid cells in medial entorhinal cortex encode spatial location with a hexagonal coordinate system; research in theoretical neuroscience suggests that this is because such a tiling of space has the highest resolution (Fisher information) in 2D space [27].As an illustrative example, we show that Residue Hyperdimensional Computing can also implement hexagonal coordinate systems, and that such a hexagonal lattice retains coding advantages over square lattices.Of particular note, we formulate a self-consistent encoding that extends a residue number system to non-negative hexagonal coordinates.
To encode a two-dimensional position into a three-coordinate frame requires two steps.First, we project the 2D vector x into a 3D vector y with unit vectors whose angles each differ by 2π  3 (Methods 4.3).This coordinate representation is known as the 'Mercedes-Benz' frame in R 2 ; it is well-studied in signal processing [28] and has attracted recent interest in hyperdimensional computing (e.g., [19], [29]).For our purposes, this step is necessary, but not sufficient, because projections can result in negative values.Second, to rectify values, we encode y with the method described in Section 2.4.1, but with the additional constraint that z([1, 1, 1]) = z([0, 0, 0]), ensuring that every state with negative coordinates has an equivalent representation to one with non-negative coordinates (Figures 3a and 3c, Methods 4.3).This constraint also reflects the fact that equal movement in every direction cancels out, and thus it enforces that different paths to the same 2D position result in the same high-dimensional encoding.The kernels induced by vectors of individual moduli (Figures 3b  and 3d) and by the residue vector (Figure 3f) exhibit the six-fold symmetry characteristic of hexagonal lattices and grid cells [9].
We can, therefore, represent the hexagonal coordinate system with a Voronoi tesselation (Figure 3e) in which different regions of space are mapped to their nearest integer-valued 3D coordinate.A hexagonal system with modulus m has 3m 2 − 3m + 1 distinct states and requires 3m codebook vectors, whereas a square lattice has m 2 distinct states and requires 2m codebook vectors (Figure 3g).Thus, the hexagonal system achieves better spatial resolution (it is a higher entropy code with regards to space) than a square lattice (Figure 3h) for the same number of resources.

Extensions to sub-integer decoding resolution
In previous sections, we worked exclusively with integer states and residue number systems implementing them.Intriguingly, however, we can extend our definition of FPE to rational numbers (Methods 4.4.1), and the resonator converges to FPE encodings of non-integers, even when codebooks contain only encodings of integers (Figures 4a and b).Strictly speaking, such extensions beyond integers are not residue number systems, and multiplicative binding is no longer well-defined.However, extensions towards sub-integer resolution have been considered in theoretical analyses of grid cells, e.g., in [11], [30], and we show that the resonator dynamics achieve this sub-integer resolution.
An efficient procedure for decoding with sub-integer precision is suggested by Figure 4b.The inner products between codebook states and the final resonator network state are welldescribed by evaluations of a Dirac comb convolved with a sinc function (Supplement B).For integer encodings, this function would evaluate to 0 for all features not near the peak, but for non-integers this is no longer the case.Still, we can find the sub-integer offset that best matches the resonator network state in order to decode the sub-integer value (Methods 4.4.2).
Phase noise is the limiting factor in decoding sub-integer states.To quantify this more rigorously, we evaluate a resonator network with varying effective ranges M under different noise regimes (κ = 16 and 1, respectively).We then split each unit interval into r partitions, so that there are M • r distinct numbers represented.Figures 4c and 4e show the accuracy of decoding for a different number of partitions with κ = 16 and κ = 1, respectively.To account for accuracy and the number of different states distinguished, we also report the 'bits per vector' metric [31] in Figures 4d and 4f.This metric validates that with lower noise, we can more reliably decode a higher number of states.

Efficient disentangling of object shape and pose from images
Here we study the disentangling problem in vision-that is, the task of recovering the underlying components of a scene given only the image pixel values.Such problems abound in visual perception; examples include inferring structure-from-motion or separating spectral reflectance properties from shading due to 3D shape and lighting.How brains disentangle these factors of variation in general is unknown, and it is computationally challenging due to the combinatorial explosion in how factors combine to create any given scene [32].
Here we demonstrate how Residue Hyperdimensional Computing can efficiently tackle the combinatorial complexity inherent in visual perception by considering a simple case of images containing three factors of variation: object shape, horizontal position, and vertical position.Let O, H, V be finite sets listing the possible vectors for each factor.The goal is to infer the We solve this problem in two stages.First, we form a latent, feature-based representation of the image via convolutional sparse coding (Methods 4.5).This step mirrors the neural representation in primary visual cortex, which is hypothesized to describe image content in terms of a small number of image features [33].We observe that this step is useful as it helps to decorrelate image patterns, thus achieving higher accuracy and faster convergence for the resonator network.
Second, we encode the latent feature representation into a high-dimensional vector that can be subsequently factorized into its components (O a , H b , V c ) via a resonator network.This is accomplished, following [34], by superimposing the residue number encodings of the position of each image feature into a single scene vector, s (Methods 4.5, equation 10).The resulting vector, s, can be expressed equivalently as a product of vectors representing object shape and position, and thus the problem of disentangling these factors essentially amounts to a vector factorization problem.
The standard way to factorize the scene vector (e.g., as in [34]) would be to use three codebooks corresponding to shape, horizontal position and vertical position, for a total of 10 + 105 * 2 = 220 codebook vectors.By contrast, a residue number system with moduli {3, 5, 7} uses 7 factors but only 10 + (3 + 5 + 7) * 2 = 40 vectors.Example runs of both problem setups are shown in Figure 5b.
Figure 5c demonstrates the two main advantages of the residue resonator compared to the standard resonator baseline: a reduction in both memory requirements (as just described) and the required number of iterations.Whereas the standard resonator takes over ≈ 2,000 codebook evaluations on average in our simulations, the residue resonator averages only ≈ 800 codebook evaluations.(Both dramatically improve over the brute force search, which requires 110,250 codebook evaluations).The key lesson is that this simple change to a resonator network leads to a multiplicative decrease in the number of computations required.

Generating exact solutions to the subset sum problem
Here we apply Residue Hyperdimensional Computing to the subset sum problem.Formally, the problem asks if a multiset of integers, S, contains a subset S * that sums to a target integer T .A further demand is to return S * if it exists.When all integers are positive, the subset-sum problem is NP-complete [35].It is a useful case to consider because there are well-known polynomial-time reductions from other NP-complete problems (e.g., 3-SAT) to subset sum [36].
To find solutions to the subset sum problem, we encode T as a vector, z(T ), and use |S| factors.Each factor, F k , has a codebook of two items: an identity vector z(0) and z(S k )reflecting the binary decision to include that item in the sum or not. Figure 6a demonstrates the resonator network successfully finding the solution when T = 21 and |S| = 6.
In order to use a residue number system, we need to choose M so that M > s∈S s.This means that we need ⌈log M ⌉ bits per unit.This is an improvement from previous work using resonator networks to solve factorization problems [37], which requires floating point precision to perform semi-prime factorization.
To understand the scaling capacity of the residue number system, we evaluate the performance of the resonator network as the set size increases.We observe that the resonator network finds exact solutions to the subset sum problem for large sets, and that performance improves with higher vector dimension (Figure 6b). Figure 6c illustrates that the success probability after up to 10 trials matches what is expected from 10 independent runs of the 1-trial accuracy.This finding suggests that the resonator network constitutes a 'Las Vegas' algorithm [38], in which each run has a success probability p, p is independent across runs, and so the algorithm requires 1/p iterations on average.Accuracy also depends on the integer range searched over, even for the same set size (Figure 6d), perhaps because larger integer ranges reduce the probability of multiple subsets matching the target.
Finally, we compare our subset sum algorithm to brute force search and an exponentialtime algorithm that solves the decision problem.The average number of iterations required by the resonator network to find a solution is drastically less than the exponentially increasing cost of a brute force search (Figure 6e) and also improves with higher dimension.We find that on a CPU, the resonator network has faster clock time than the exponential time algorithm for |S| > 28 (Figure 6f, Methods 4.6); most of the compute time is spent on generating the vector representations, rather than the resonator network dynamics itself.More significantly, whereas the baseline algorithm required O(2 |S| ) memory to keep candidate subsets in memory, the resonator network only requires O(D • |S|) memory, since it never needs to explicitly represent every subset.We emphasize that the CPU implementation of the The resonator network converges to the correct solution after a few iterations.b, Performance of the resonator network on randomly selected subset sum problems with a fixed set size.c, The probability of success scales as expected with independent trials on each start.d, Success of the resonator network on the subset sum problem depends on the range of items indexed in the subset sum problem (larger ranges are harder to factorize).e, Performance of the resonator network in terms of the expected number of iterations compared to chance.The expected number of iterations for the resonator network scales favorably compared to brute force search and improves with higher encoding dimension.f, Comparison of average compute time of the resonator network versus an exact algorithm.The initialization cost of setting up the resonator network is higher; even so, for large set sizes, the resonator network is faster.
resonator network is primarily intended as a proof of concept, and that further performance gains would likely result from implementing the resonator network on emerging computing platforms, as in [34], [39].

Discussion
Our study provides the first definition of residue number systems with hyperdimensional computing.The framework inherits the benefits of both systems: the carry-free arithmetic and Chinese Remainder theoretic guarantees from residue number systems, along with the robustness and computing-in-superposition properties of hyperdimensional computing [40].The framework provides a favorable way to encode, transform, and decode variables that is robust to noise.Taken together, these properties make Residue Hyperdimensional Computing an appealing framework for efficiently solving difficult computational problems, especially those involving combinatorial optimization.It also has implications for modeling population codes in the brain, in particular grid cells.
Prior work in computational neuroscience [11], [30] has emphasized that residue number systems endow grid cells with useful computational properties, including high spatial resolution, modular updates, and error correction.We demonstrate that Residue Hyperdimensional Computing successfully achieves each of these coding properties (Sections 2.2 & 2.3) in a working neural implementation.Reciprocally, our algebraic framework makes two contributions to computational neuroscience.First, we show how to extend a residue number system to a self-consistent, non-negative hexagonal coordinate system.Second, we provide a new algorithm for collectively coupling spatial position to grid cell modules via the resonator network.The core prediction our framework makes for systems neuroscience is that each grid cell module corresponds to a factor estimate in the resonator network.More specifically, each module implements a toroidal attractor network, and multiplicative couplings with hippocampus and other grid cell modules enable error correction.This prediction is consistent both with recent experimental analysis supporting the existence of continuous attractor networks in single grid cell modules [41], and with recent theories of joint attractor dynamics in hippocampus and medial entorhinal cortex [42].
The framework also has implications for how neural populations can solve difficult optimization problems, such as disentangling of visual scenes.Recent work has emphasized the promise of hyperdimensional computing as an abstraction for neuromorphic computing [34], [40], [43].Residue Hyperdimensional Computing substantially reduces the storage and average number of operations needed for solving decoding problems and combinatorial optimization, contributing a simple yet powerful improvement.In addition, the phasor representations suggested by our framework directly map onto Q-state phasor networks [44], suggesting promising implementations in spiking neural networks [45] and strategies for solving combinatorial optimization problems [46].
Finally, the performance of our framework on the subset sum problem suggests a new route for solving optimization problems with distributed representations and unconventional hardware.The subset sum problem is a particularly good fit for our framework because it is easily implemented by the Hadamard product operation on high-dimensional vectors.Since other hard problems, such as 3-SAT, can be efficiently mapped to subset sum, our results potentially point the way to a new class of parallel algorithms for efficiently solving NP-hard problems.

Definitions of algebraic operations 4.1.1 Definition of additive binding operation
We implement addition within Residue Hyperdimensional Computing by using the Hadamard product operation (component-wise multiplication, ⊙) : i.e., z(x 1 + x 2 ) = z(x 1 ) ⊙ z(x 2 ).The Hadamard product correctly implements addition because it is commutative.This can be seen as follows:

Definition of multiplicative binding operation
To implement a second binding operation (⋆), such that z( , every component of the vector z(x 1 ) must be multiplied by x 2 .If we had the value of x 2 explicitly, then we could directly implement z(x 1 • x 2 ) by component-wise exponentiation of z(x 1 ) by x 2 .However, decoding incurs additional computational costs, and we show here that multiplications can be computed without this intermediate step.
We require a few simplifying assumptions to define our multiplication operation.First, we assume that we have access to the individual base vectors for each moduli (e.g., z m 1 (x 1 )).If we do not, then we can use the resonator network to recover them.The key observation is that if x is an integer, then each component of z m k (x) is itself a m k -th root of unity.More specifically, it equals e i 2π m k r j , for some integer r j = m k 2π ϕ j x (mod m k ).Therefore, we define an operation, f , that can multiply two discrete phases when they are both drawn from the m k -th roots of unity: rs .When f is applied to two vectors of the same dimension, the multiplication is applied component-wise.Supposing that r = m k 2π ϕx 1 , and s = m k 2π ϕx 2 , we obtain rs = m k 2π ϕ 2 x 1 x 2 , which is off from our desired result by a multiplicative factor of ϕ.This motivates a final step of cancelling out this extra factor.
Because each phase ϕ is drawn from the m k -th roots of unity, it can be written as 2π m k u, where u ∈ Z (mod m k ).When m k is prime, then any non-zero integer u has a unique modular multiplicative inverse v ∈ Z (mod m k ), such that u × v = 1 (mod m k ).For example, the modular multiplicative inverse of 3 (mod 5) is 2. Consequently, f e Motivated by these reasons, we therefore assume that whenever multiplicative binding is used, every moduli m k is prime.This assumption allows us to define an "anti-base" vector, y m k , whose components are defined by the modular multiplicative inverses of These assumptions motivate the following definition of the multiplicative operation, which we show successfully performs the multiplication of the arguments along with necessary cancellations: We implement f by extracting s by taking the angle of the phasor e r by the result.We compute modular multiplicative inverses via the built-in pow function in Python.However, we note that both functions can also be implemented by lookup tables, and precomputing all input-output pairs may be optimal when many computations are re-used and lookups are inexpensive.

Decoding methods
In the context of high-dimensional distributed representations, the decoding problem is to recover a variable x from a distributed representation z(x).In all of our decoding experiments, x is either an integer or rational number.A survey of decoding methods, applied to symbolic hyperdimensional computing models, can be found in [23].

Codebook decoding
Codebook decoding estimates x by taking inner products between z(x) and a precomputed set of reference vectors: x = arg max

Resonator network details
The resonator network [24], [25] is an algorithm for factoring an input vector, z, into the primitives {z 1 , z 2 , . . ., z K } that compose it via Hadamard product binding: Each z j is specified to come from a set of candidate vectors concatenated in a codebook, Z j , and therefore the search space grows combinatorially in terms of the product of codebook sizes.The resonator network functions as a dynamical system with the following update equations: where g is a non-linearity preserving phase and discarding angles of each complex component.
In the case where we are decoding the representation of a residue number, the codebook for each factor (or moduli) Z j is composed of m j entries, which are the vector encodings for the residues [z m j (0), z m j (1), . . ., z m j (m j − 1)].In all experiments, we use an asynchronous update rule in which at each time step, only one factor estimate is updated, and every set of time steps, each vector is updated once.The algorithm runs either until convergence or until a maximum number of iterations has been reached.We consider the resonator converged when the normalized cosine similarity between two successive states exceeds a threshold, α (for all experiments, α = 0.95).

Evaluation of resonator network decoding accuracy, capacity and robustness to noise
We evaluate resonator network accuracy as a function of vector dimension (D), effective range (M ), number of moduli (K), and noise level (dependent on κ).We add noise only in experiments shown in Figure 2f, and K = 2 unless stated otherwise.D = 1024 in Figure 2d, and D = 512 in Figure 2f.To compute data points for curves that are a function of M , we generate a list of ascending primes, and select K consecutive primes as moduli.The effective range, M , is the product of these moduli.We continue experiments for a fixed D and increasingly large M until empirical accuracy falls below a given threshold (0.95 for Figures 2a and 2c, and 0.05 otherwise).To report the required number of comparisons for Figure 2a, we normalize the average number of inner product iterations by the accuracy, and visualize curves only in the high-accuracy regime (above 95 percent).

Hexagonal residue encodings
To project a 2D vector x to a 3D hexagonal coordinate y, we multiply it by a matrix Ψ: The resulting vector y = Ψ x is encoded as a high-dimensional vector using the generalization to multiple dimensions specified in Equation 5.As an additional constraint, we require that z([1, 1, 1]) = z([0, 0, 0]).This condition implements the self-cancellation property (i.e, that moving equally along the three equiangular directions cancels out).It also converts possible negative values arising from the projection step to an equivalent non-negative coordinate encoding.Somewhat fortuitously, this constraint is naturally enforced in RHC by ensuring that for each component of z, the three phases corresponding to the three directions sum to 0 (mod 2π).This is achieved by constraining the joint probability distribution over triplets of phases so that this requirement is met (Figures 3a and 3c).

Decoding with sub-integer precision 4.4.1 Extension of encoding scheme to rational numbers
For a rational number q ∈ Q, we define z m (q) based on Fractional Power Encoding, modulo m as: z m (q) = [e iϕ 1 q , e iϕ 2 q , . . ., e iϕ D q ] (7) We then form our representation of the remainders (modulo m k ) via the same process described in Equation 4. If q is an integer, then this procedure matches that of Definition 2.2.1.But in general, z m (q) ̸ = (z m ) q .This is significant because while we can still evaluate similarity via inner products and perform addition operations, multiplication operations are no longer well defined.

Sub-integer decoding with the resonator network
Sub-integer decoding with the resonator network proceeds in three steps.First, we let the resonator update its factor estimates until convergence to a fixed point.We emphasize that sub-integer encodings are also fixed points of the resonator, even when the codebooks of the resonator contain only integers (Section 2.5).Second, we find the nearest integer codebook for each moduli, and generate the nearest codebooks for the fractional values within range 1 of that decoded integer (r in total).Third, we use codebook decoding over these vectors encoding fractional values to return our result.

Evaluation of sub-integer decoding with noise
We fix D = 512, and let κ = {16.0,1.0}.We run the resonator network until a maximum number of iterations or convergence, and evaluate if both the nearest integer and nearest fractional state are correct.If so, we regard the solution as correct, reporting accuracy and bits per vector.

Measuring bits per vector
To measure the total amount of information decoded, we account for the accuracy of decoding and the number of states distinguished.The amount of information decoded for a single number (denoted as I num ) is calculated using the corresponding accuracy (a) and size (P ) of the total search space as For a detailed derivation of this equation, please refer to Section 2.2.3 of [31].According to this metric, the amount of decoded information is 0 when the accuracy is at chance (1/P ).

Visual scene factorization experiments
Convolutional sparse coding learns a dictionary of basis functions, {ϕ j (x, y)}, and infers a set of sparse latent representations, {A j (x, y)}, for each image, I(x, y), by minimizing the following energy function, E : where * denotes convolution, and λ is a hyperparameter weighting the tradeoff between reconstruction error vs. sparsity.We use the SPORCO implementation of convolutional sparse coding introduced by [47] to learn the {ϕ j (x, y)} for an ensemble of MNIST digits, and to infer the sparse representation A for each image I.
A useful feature of convolutional sparse coding is its equivariance to 2D translation; that is, 2D translation in the image domain results in 2D translation of the sparse representations, {A j (x, y)}.We can thus convert the set of sparse feature maps {A j (x, y)} to a high-dimensional vector as follows: Here h(x) and v(y) denote the RHC encodings of horizontal (x) and vertical (y) position.d j is a random vector generated i.i.d. that represents the identity of each basis function ϕ j .By expectation, most values of each A j (x, y) will be zero, because the energy function for sparse coding penalizes non-zero coefficients.Thus, the scene vector, s, can be seen as a sparse superposition of position encodings of features (ϕ i ) contained in the image.Now, we can separately define the vector encoding of each object i to be recognized as where o (i) (x, y) is the sparse representation of the image of object i within a canonical reference frame.If we were to place object i at position (x ′ , y ′ ) within an image, the resulting scene vector computed according to equation 10 will be given as Therefore, our scene analysis problem amounts to one of factorizing s into its constituent vectors h(x ′ ), v(y ′ ), and O (i) .We can factorize s using a resonator network with three codebooks, O, H, and V.Each element O (i) ∈ O consists of an encoding of each object as above, and H and V contain RHC encodings of horizontal and vertical position.
For our object examples, we use 10 images from the MNIST dataset.Sparse coding dictionary elements are optimized over a subset of the MNIST dataset.After inferring a sparse code for each image, we encode it as a high-dimensional vector (D = 10, 000).We use a residue number system with bases {3, 5, 7} for both horizontal and vertical dimension and then either enumerate all 105 codebooks for a single factor (Standard) or use 3 factors with 3, 5, and 7 codebooks, respectively (Residue).In either case, we ran the resonator network until convergence to a vector matching the scene representation (including reinitialization, if it did not converge after a fixed number of iterations or became stuck in a local minima), and record the average number of iterations multiplied by the average number of codebook evaluations (which is smaller for the residue encoding).

Subset sum experiments
We use a residue number system with 3 moduli, {m − 1, m, m + 1}, where m is a positive integer, ensuring that our moduli are co-prime.To generate random subset sum problems, we first define a maximum sum range to be M/2.For Figures 6b and 6c, m = 200, M ≈ 200 3 .Then, we draw random variables from a uniform distribution (scaled between 0, and half of the maximum sum over the largest set size tested).We then select a random subset of the set (all subsets are equally likely) and compute the sum.This sum forms the input to the resonator network, and we treat its solution is correct if it converged to the same sum.If the resonator network returns the wrong output, we restart it from a different random initialization, up to a maximum number of trials.We vary both the vector dimension (D) and set size (|S|), reporting accuracy after multiple simulations.For Figure 6d, D = 400.
To compare the number of evaluations relative to brute force (Figure 6e), we record the average number of evaluations on each set size.We divide the number of inner product comparisons required for brute force evaluation by the number of comparisons per resonator network iteration.Further, we normalize the number of resonator iterations by the accuracy to ensure a fair comparison.In comparing our algorithm to a solver, we implement an exact subset-sum algorithm as a baseline [48].We let m = 1,000, D = {10,000, 20,000}, and draw integers uniformly from the range [0, 5000].

Supplemental material A A brief survey of distributed coding schemes
In order to process vector representations of numbers, such as in machine learning settings [49]- [54], previous work combined hyperdimensional computing with different kinds of locality-preserving encodings for representing numeric data with vectors.The requirement to be locality-preserving is that inner products between vectors encode similarity of the underlying data.Here we briefly review some locality-preserving encoding schemes that have been used in the past (see also [55]), assessing their kernel properties.A.1 The thermometer code The thermometer code [4], [56], [59], [60] is a simple and structured way to form a localitypreserving encoding for a range of discrete levels s, s ∈ [0, D].The first code z(0) consists of all -1s.For other levels, the components of z(s) are determined as: Thus, the last code z(D) consists of all +1s, and, in total, the thermometer code can represent D + 1 levels.Figure S.1 shows how cosine similarity appears for several different levels when D = 50.Thermometer codes produce a translation-invariant kernel that is triangular and has a width of 2D + 1 levels.It is a nonlocal kernel, in the sense that there are no two points in the encoding range that have a similarity of zero.In practice, thermometer codes are used commonly when applying hyperdimensional computing to classification problems.
A.2 The float code The float code, also known as the sliding code, [57], [59] addresses the issue of the thermometer code, i.e., that the similarity decay is not local.This is done by using w consecutive +1 components ("float") where the size of w regulates similarity characteristics of the code.For the binary case, the similarity kernel of the float code is the triangular kernel of width 2w + 1 levels.To encode the lowest value z(0), the first w components of the vector are set to +1s while the rest of the components are 0s.In general, the components of z(s) are determined as:  Scatter codes [58], [59], [61] are another alternative to form a locality-preserving encoding where similarity decays nonlinearly.In scatter codes, the code for the first level z(0) is chosen randomly while each subsequent code is obtained from the previous one by randomly swapping its components with some probability p: z i (s) = −z i (s − 1), r i ≤ p z i (s − 1), otherwise (S.3)

B Kernel properties of Residue Hyperdimensional Computing
To show that Fractional Power Encoding (modulo m) results in approximation of a particular periodic kernel with period m, we observe that our probability distribution can be written as a Dirac comb function pointwise multiplied by a rect function.This fact becomes useful when we see that our kernel approximates a Fourier integral.Letting x = x 1 − x 2 , we take the following steps to show convergence in the infinite-dimensional limit: Thus, our kernel is a 'sinc comb' function: a sum of sinc functions spaced with a period of m.This result is particularly notable because sinc evaluates to 0 for integers that are not a multiple of m and means that distinct integers (and remainders) are orthogonal in the high-dimensional space.
To simplify the equation even further, we can derive a sum-less expression for an infinite number of sinc functions.We need to consider two cases: 1) x = ms for some s ∈ Z, and 2) x ̸ = ms for all s ∈ Z.
Case 1 is straightforward.Without loss of generality, let x be the value for which x−ms = 0. Then we have: We confirm our result by comparing our analytic values for both cases to the kernel induced by a high-dimensional vector (Figure S.4).In the limit of m → ∞, both kernels converge to the sinc function: K(x) = sin(πx) πx .

Figure 1 :
Figure 1: Residue Hyperdimensional Computing defines a kernel separating different remainder values, and it enables algebraic operations.a, For fractional power encoding, modulo m = 5, inner products between vectors reflect the similarity of points with the same remainder value and are quasi-orthogonal elsewhere.The light blue curve shows the kernel shape when ∆x is a real-valued scalar; integers occur approximately at zero crossings.A further derivation is provided in Supplement B. b, The kernel induced by an RHC vector (green) is the product kernel of the moduli used to form it (orange, blue).c, Demonstration of addition and multiplication.Blue and orange show encodings of 2 and 3, respectively.Teal shows the decoded value of 2+3 (i.e., 5); purple shows decoded value of 2 × 3 (i.e., 6).

Figure 2 :
Figure 2: The resonator network can efficiently recover moduli of different numbers.a, The resonator network outperforms codebook decoding by over an order of magnitude when the effective range is within resonator network capacity.b, Demonstration of resonator network capacity.For a fixed vector dimension, accuracy remains high up to a given range, before gradually falling off.c, Scaling of resonator network capacity, C, as a function of dimension, D, is well-described by a quadratic fit (orange dashed line; cf.linear fit in blue dash-dotted line).Quadratic fit coefficients are (1.3 × 10 −1 )D 2 + (2.4 × 10 1 )D − 3.7 × 10 3 , linear fit coefficients are (7.6 × 10 2 )D − 7.6 × 10 5. d, Resonator network performance is slightly worse for a higher number of moduli, K, but e, an advantage of lower moduli is a much higher effective range given encoding resources.f, The capacity of the resonator network remains high even in the presence of large amounts of phase noise.

Figure 4 :
Figure 4: The resonator network enables retrieval of fractional (sub-integer) values.a, Example of the resonator network converging for z(x), x = 40.4.b, Inner product values decoded by the resonator network are predicted by fractional offsets from a Dirac comb convolved with a sinc function.c, Sub-integer encoding accuracy under a low noise regime for a RHC vector (r denotes number of sub-integer partitions).d, Bits per vector for different fractional decodings.e, f, same as c and d, respectively, but under higher noise conditions.

Figure 5 :
Figure 5: Residue Hyperdimensional Computing enables efficient disentangling of images.a, Processing pipeline: an image is first represented in terms of its local shape features via convolutional sparse coding, then converted to a high-dimensional vector z by superimposing the residue number encodings of the positions of each feature in the image, which is finally factorized into object identity and position via a resonator network.b, Simulations of the resonator network on visual scenes without the residue number encoding (Standard) and with the residue number encoding (Residue).c, With a residue number system, the resonator network requires less memory overhead (40 vs. 220 codebook vectors) and less total computation to converge to the correct solution (792.4 vs. 2085.6average codebook evaluations).Error bars show the 25 th and 75 th percentile of the number of evaluations.

Figure 6 :
Figure 6: The resonator network, with Residue Hyperdimensional Computing, enables successful searches for solutions to the subset sum problem.a, Demonstration of the resonator network on the subset sum problem, with S = {18, 4, 5, 10, 2, 23}.The resonator network converges to the correct solution after a few iterations.b, Performance of the resonator network on randomly selected subset sum problems with a fixed set size.c, The probability of success scales as expected with independent trials on each start.d, Success of the resonator network on the subset sum problem depends on the range of items indexed in the subset sum problem (larger ranges are harder to factorize).e, Performance of the resonator network in terms of the expected number of iterations compared to chance.The expected number of iterations for the resonator network scales favorably compared to brute force search and improves with higher encoding dimension.f, Comparison of average compute time of the resonator network versus an exact algorithm.The initialization cost of setting up the resonator network is higher; even so, for large set sizes, the resonator network is faster.

Figure S. 1 :
Figure S.1: Similarity kernel of the thermometer code shown for several levels; D was set to 50.

Figure S. 2 :
Figure S.2: Similarity kernel of the float code shown for several levels; D was set to 60 while w was set to 10.

Figure S. 2
Figure S.2 depicts how similarity (inner product normalized by w) decays for several levels in the float code for D = 60, w = 10.The float code also produces a triangular kernel, but in contrast to the thermometer code, it allows controlling the width of the triangular kernel.The number of levels it could encode is still limited and equals to n − w + 1.

Figure S. 3 :
Figure S.3: Similarity kernel of a scatter code; D was set to 1000, p was 0.05.The values of similarities were averaged over 50 random initializations of the code.

Figure S. 4 :
Figure S.4: The analytic kernel expected by dashed lines matches the approximate kernel generated by a random vector of sufficiently high dimension (D=50,000).a, match for an odd modulus (m = 5), b, match for an even modulus (m = 6).

Table S .
1: Existing high-dimensional vector-based schemes for encoding numbers (first five rows) in comparison to our proposed framework (last row).